QMDP-Net: Deep Learning for Planning under Partial Observability
نویسندگان
چکیده
This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDPnet on different tasks so that it can generalize to new ones in the parameterized task set and “transfer” to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.
منابع مشابه
Learning Plans for Safety and Reachability Goals with Partial Observability
Traditional planning assumes reachability goals and/or full observability. In this paper, we propose a novel solution for safety and reachability planning with partial observability. Given a planning domain, a safety property, and a reachability goal, we automatically learn a safe and permissive plan to guide the planning domain so that the safety property is not violated and which can force th...
متن کاملPlanning with Extended Goals and Partial Observability
Planning in nondeterministic domains with temporally extended goals under partial observability is one of the most challenging problems in planning. Simpler subsets of this problem have been already addressed in the literature, but the general combination of extended goals and partial observability is, to the best of our knowledge, still an open problem. In this paper we present a first attempt...
متن کاملProduct Representation of Belief Spaces in Planning under Partial Observability
We present a product representation of belief spaces for planning under partial observability. In earlier work we investigated backward plan construction based on a combination operation for belief states. The main problem in explicit construction of belief states is their high number. To remedy this problem, we refrain from representing individual belief states explicitly, and instead represen...
متن کاملLTLf and LDLf Synthesis under Partial Observability
In this paper, we study synthesis under partial observability for logical specifications over finite traces expressed in LTLf /LDLf . This form of synthesis can be seen as a generalization of planning under partial observability in nondeterministic domains, which is known to be 2EXPTIMEcomplete. We start by showing that the usual “belief-state construction” used in planning under partial observ...
متن کاملDeep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrentlyexploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to ...
متن کامل